13 research outputs found

    A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience

    Get PDF
    The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about modelled entities is a painstaking and low-reward task. Text mining can be used to help a curator extract relevant information from this literature in a systematic way. We propose the application of text mining methods for the neuroscience literature. Specifically, two computational neuroscientists annotated a corpus of entities pertinent to neuroscience using active learning techniques to enable swift, targeted annotation. We then trained machine learning models to recognise the entities that have been identified. The entities covered are Neuron Types, Brain Regions, Experimental Values, Units, Ion Currents, Channels, and Conductances and Model organisms. We tested a traditional rule-based approach, a conditional random field and a model using deep learning named entity recognition, finding that the deep learning model was superior. Our final results show that we can detect a range of named entities of interest to the neuroscientist with a macro average precision, recall and F1 score of 0.866, 0.817 and 0.837 respectively. The contributions of this work are as follows: 1) We provide a set of Named Entity Recognition (NER) tools that are capable of detecting neuroscience entities with performance above or similar to prior work. 2) We propose a methodology for training NER tools for neuroscience that requires very little training data to get strong performance. This can be adapted for any sub-domain within neuroscience. 3) We provide a small corpus with annotations for multiple entity types, as well as annotation guidelines to help others reproduce our experiments

    A Neural Layered Model for Nested Named Entity Recognition

    No full text

    Comparing Neural Models for Nested and Overlapping Biomedical Event Detection

    No full text
    BACKGROUND: Nested and overlapping events are particularly frequent and informative structures in biomedical event extraction. However, state-of-the-art neural models either neglect those structures during learning or use syntactic features and external tools to detect them. To overcome these limitations, this paper presents and compares two neural models: a novel EXhaustive Neural Network (EXNN) and a Search-Based Neural Network (SBNN) for detection of nested and overlapping events. RESULTS: We evaluate the proposed models as an event detection component in isolation and within a pipeline setting. Evaluation in several annotated biomedical event extraction datasets shows that both EXNN and SBNN achieve higher performance in detecting nested and overlapping events, compared to the state-of-the-art model Turku Event Extraction System (TEES). CONCLUSIONS: The experimental results reveal that both EXNN and SBNN are effective for biomedical event extraction. Furthermore, results on a pipeline setting indicate that our models improve detection of events compared to models that use either gold or predicted named entities

    COPD Corpus

    No full text
    The COPD corpus is a semantically annotated corpus, focussed on phenotypic information, consisting of 30 full-text articles. The corpus has been manually annotated with named entities, using a fine-grained annotation scheme, which aims to capture detailed information about COPD phenotypes. In particular, the annotations may be "nested" within each other. This is to take into account the potentially complex and nested nature of phenotype descriptions, which may include mentions of various other types of concepts within them

    Data from: Annotating and detecting phenotypic information for chronic obstructive pulmonary disease

    No full text
    Objectives: Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information. Materials and methods: Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions. Results: Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information. Discussion: Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments. Conclusion: The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases
    corecore